MapAnything is an end-to-end trained Transformer model that can take multiple modalities as input and directly regress the decomposed metric 3D geometric structure of the scene. This model supports more than 12 different 3D reconstruction tasks, including multi-image SfM, multi-view stereo vision, and monocular metric depth estimation.
Computer Vision
SafetensorsEnglish